Alex Krenitsky

Kubernetes the Hard, Hard Way: Part II

Worker Nodes, and Deploying Workloads

Published: 04/20/2026

In my previous post, I explained how I configured and launched the kubelet on my master node. With that in place, I was ready to bring up the control plane components. Most of the components were launched as static pods.

Below is the list of components that I launched before bringing up my worker nodes. They are in the order that I deployed them in.

1etcd
2kube-apiserver
3kube-controller-manager
4kube-scheduler
5kube-proxy
6flannel
7coredns

In order to get a head start on writing the yaml definitions for each of these components, I borrowed from kubeadm. I grabbed the kubeadm binary, scp’d it up to my node machine, and then ran kubeadm init phase control-plane in order to generate the control plane component yaml definitions. I then scp’d those yaml files back down to put them in my repo and edit them. For each component I created a script to use for launching it. In many cases, the script is a wrapper around kubectl apply, but I found it helpful to have a script to record how the thing was launched in case there were extra steps that I probably wouldn’t remember. Below are some notes on each component.

etcd

My etcd setup is simple. It’s running as a static pod with a hostpath volume mounted for the data directory. I know this isn’t a very resilient setup, but it works for my use-case here. Ideally, I think I’d want etcd running on its own VM, or for an HA setup it would run as a cluster on multiple VMs. But I’m not made of money, and this Kubernetes cluster is already way overkill for my tiny website, so static pod it is.

kube-apiserver, kube-controller-manager, scheduler

There’s not much to say for these components. I used a similar pattern again, launched each component as a static pod, and mounted the necessary certificates as hostpath volumes.

kube-proxy

With the above components in place, I could launch kube-proxy with kubectl apply. I tried two different ways of configuring it. In the first approach, I used certificates that I manually created, similar to the other control plane components, and that worked fine. Then I tried skipping the manual certificate generation and just had kube-proxy authenticate using the service account. That approach turned out to work well, with the one caveat that I had to make sure that kube-proxy was configured with the IP address of the kube-api-server. Otherwise, kube-proxy tries to reach the kube-apiserver at the ClusterIP and the iptables rules that would typically translate the ClusterIP address to an actual IP address don’t exist yet because that’s the kube-proxy’s job.

flannel and coredns

In keeping with the goals of this project, I chose flannel for my CNI plugin because of its simplicity. To deploy flannel, I downloaded the manifest from the flannel repo and launched it with kubectl apply. I ran into a couple of small hiccups with getting flannel launched properly. First I realized that the nf_conntrack kernel module that is needed for iptables NAT/connection tracking wasn’t loaded, so I added that. Then I noticed that tcp connections were timing out and failing. After a bit of debugging, I realized that flannel was not configured to use the private host IPs. My VMs all share a VPC, and I set them up to talk on the 10.0.0.0 network, but flannel was defaulting to the public IPs. I edited the flannel daemonset to use the internal IPs, and everything worked ok after that.

I then downloaded the coredns manifest and launched it without any issue.

worker nodes

The control plane node was now up and functioning, and it was time to bring up the worker nodes. I reused the same startup script that I had used for my control plane to install binaries, certs, configs, and to startup services. The process for the worker nodes was slightly different because, now that I had the control plane up, I could use the tls bootstrapping process described here, and skip the manual cert generation that I did for the control plane. In this tls bootstrapping process, worker node kubelets start up with a bootstrapping kubeconfig that uses a unique bootstrapping token. With that token the kubelet can authenticate to the kube-apiserver, and gain the minimal permissions it needs to create a certificate signing request (CSR) for a client certificate that it can then use to fully authenticate with the kube-apiserver and gain the full permissions it needs for normal operation. I configured automatic approval for CSRs, so once the kubelet starts, it should be good to go. The only caveat here is that this bootstrap process only works for client certificates. For server certificates you can still use the bootstrap process to create server CSRs but there isn’t built in support for automatically approving them. Because of that I chose to still generate the server certs manually like I did before.

deploying workloads

With the above steps done, I now had my cluster fully online. My next steps were to deploy some infrastructure components, specifically

1nginx gateway fabric
2linode csi driver
3cert manager
4prometheus operator
5grafana

nginx gateway fabric

I started with nginx gateway fabric. Nginx gateway fabric is an implementation of the Kubernetes Gateway API. The basic architecture of nginx gateway is an nginx controller pod which monitors and controls the nginx data plane pods. In each data plane pod is a container that has an nginx process that routes ingress traffic coming from either a LoadBalancer service or a NodePort service to the right backend service. In my case, since I’m doing this the hard way, there is no LoadBalancer service, so I have a dedicated load balancer server running nginx which routes traffic to either one of my worker nodes. Here’s the gist of what that nginx config looks like:

 1upstream k8_nodes {
 2  server 10.0.0.4:<NodePort port#>;
 3  server 10.0.0.5:<NodePort port#>;
 4}
 5
 6server {
 7      server_name ak0.io www.ak0.io;
 8
 9      location / {
10          proxy_pass https://k8_nodes;
11          proxy_set_header Host $host;
12          proxy_set_header X-Real-IP $remote_addr;
13          proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
14          proxy_set_header X-Forwarded-Proto $scheme;
15      }
16}

I did run into one problem with the above setup. I noticed that my website would take 60 seconds to load every time, so I knew there was a timeout happening somewhere. What I realized was that, when nginx fabric spun up, it created a dataplane gateway pod on just one of my nodes. Additionally, nginx sets the externalTrafficPolicy on NodePorts to local by default, which meant that external traffic coming in would not hop between nodes. This is the default setting because the assumption is that you’re working with a cloud load balancer that is aware of where traffic should be routed, and this allows for preserving the source IP for requests, and reducing the number of hops that a packet takes. All of this meant that, for me, when a request came in on the node that didn’t have the nginx dataplane gateway it would time out like I was seeing. I didn’t want to add another hop every request, so my solution was to scale up the nginx dataplane deployment so that it would run on both nodes.

remaining workloads

In order to set up prometheus operator and grafana, I needed cert-manager (a requirement for prometheus operator) and Linode CSI Driver (my chosen storage backend for grafana). The cert-manager and storage were both relatively straightforward to set up so I won’t go into details.

Prometheus and grafana were also smooth setups. For prometheus, I set up a kubelet service monitor so that I could get basic cluster health metrics. That’s the only metric reporting that I’ve set up for now, aside from the instrumentation in my ak0.io webapp. The only major change that I had to make was to reduce the resource requests for prometheus and grafana since my nodes did not have enough available memory. I know this bears some risk of pods being OOM killed, but for my case I don’t expect any heavy load, and this has not caused any issues so far. Also, for grafana, I used the web UI to import the dashboard that I had previously created when I had my apps running in docker. I then just had to make a few tweaks to get the datasources pointing to the right place, and everything was up and running as before.

application workloads

My last step was to launch my application workloads, ak0.io which is this site you’re on now and jupiter coffee which is my pretend coffee shop page that I made as a project to learn some frontend coding skills. Since I already had these running in docker containers previously, it was very low effort to get these up and running. I made a manifest for each, which included a namespace and HTTP routing for nginx gateway fabric, et voilà.

conclusion

I’m very glad I went through this whole process. The biggest thing I took away is just how much of Kubernetes comes down to networking. Nearly every issue I ran into, from flannel defaulting to the wrong IPs to the externalTrafficPolicy silently dropping traffic, was a networking problem. Debugging those issues forced me to get comfortable with tools and concepts like iptables rules, kernel modules, and packet tracing that are useful well beyond Kubernetes.